Language model parameter estimation using user transcriptions Citation

نویسنده

Bo-June Hsu

چکیده

In limited data domains, many effective language modeling techniques construct models with parameters to be estimated on an in-domain development set. However, in some domains, no such data exist beyond the unlabeled test corpus. In this work, we explore the iterative use of the recognition hypotheses for unsupervised parameter estimation. We also evaluate the effectiveness of supervised adaptation using varying amounts of user-provided transcripts of utterances selected via multiple strategies. While unsupervised adaptation obtains 80% of the potential error reductions, it is outperformed by using only 300 words of user transcription. By transcribing the lowest confidence utterances first, we further obtain an effective word error rate reduction of 0.6%.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Generative Dependency N-gram Language Model: Unsupervised Parameter Estimation and Application

We design a language model based on a generative dependency structure for sentences. The parameter of the model is the probability of a dependency N-gram, which is composed of lexical words with four types of extra tag used to model the dependency relation and valence. We further propose an unsupervised expectation-maximization algorithm for parameter estimation, in which all possible dependenc...

متن کامل

Estimation of Speaking Style in Speech Corpora Focusing on speech transcriptions

Recent developments in computer technology have allowed the construction and widespread application of large-scale speech corpora. To foster ease of data retrieval for people interested in utilising these speech corpora, we attempt to characterise speaking style across some of them. In this paper, we first introduce the 3 scales of speaking style proposed by Eskenazi in 1993. We then use morpho...

متن کامل

Document image decoding approach to character template estimation

Template Estimation 1 Gary E. Kopec2 Xerox Palo Alto Research Center Mauricio Lomelin3 Microsoft Corp. November 29, 1995 Abstract This paper develops an approach to supervised training of character templates from page images and unaligned transcriptions. The template estimation problem is formulated as one of constrained maximum likelihood parameter estimation within the document image decoding...

متن کامل

Large-scale Inversion of Magnetic Data Using Golub-Kahan Bidiagonalization with Truncated Generalized Cross Validation for Regularization Parameter Estimation

In this paper a fast method for large-scale sparse inversion of magnetic data is considered. The L1-norm stabilizer is used to generate models with sharp and distinct interfaces. To deal with the non-linearity introduced by the L1-norm, a model-space iteratively reweighted least squares algorithm is used. The original model matrix is factorized using the Golub-Kahan bidiagonalization that proje...

متن کامل

Parameter Estimation of Loranz Chaotic Dynamic System Using Bees Algorithm

An important problem in nonlinear science is the unknown parameters estimation in Loranz chaotic system. Clearly, the parameter estimation for chaotic systems is a multidimensional continuous optimization problem, where the optimization goal is to minimize mean squared errors (MSEs) between real and estimated responses for a number of given samples. The Bees algorithm (BA) is a new member of me...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2009

Language model parameter estimation using user transcriptions Citation

نویسنده

چکیده

منابع مشابه

A Generative Dependency N-gram Language Model: Unsupervised Parameter Estimation and Application

Estimation of Speaking Style in Speech Corpora Focusing on speech transcriptions

Document image decoding approach to character template estimation

Large-scale Inversion of Magnetic Data Using Golub-Kahan Bidiagonalization with Truncated Generalized Cross Validation for Regularization Parameter Estimation

Parameter Estimation of Loranz Chaotic Dynamic System Using Bees Algorithm

عنوان ژورنال:

اشتراک گذاری